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Abstract 

We present an algorithm for computing a Smith form with multipliers of a regular matrix 
polynomial over a field. This algorithm differs from previous ones in that it computes a local 
Smith form for each irreducible factor in the determinant separately and then combines them 
into a global Smith form, whereas other algorithms apply a sequence of unimodular row and 
column operations to the original matrix. The performance of the algorithm in exact arithmetic 
is reported for several test cases. 
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1. Introduction 

Canonical forms are a useful tool for classifying matrices, identifying their key proper- 
ties, and reducing complicated systems of equations to the de-coupled, scalar case. When 
working with matrix polynomials over a field K, one fundamental canonical form, the 
Smith form, is defined. It is a diagonalization 

A{X) = E{X)D{X)F{X) (1) 
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of the given matrix A{X) by unimodular matrices E{X) and F{X) such that the diagonal 
entries di{X) of D{X) are monic polynomials and di{X) is divisible by dj ijX) for i > 2. 
This factorization has various applications. The most common one 

[IE 111 involves 

solving the system of differential equations 

A(^)^ + --- + ^W§ + A(0)x=./(t), (2) 

where A'"' , . . . , A^"?-* are n x n matrices over C. For brevity, we denote this system by 

A{d/dt)x = f, where A{X) = A*") + A^^Uh h A(«)A«. Assume for simplicity that A{X) 

is regular, i.e. dct[A(A)] is not identically zero, and that (1) is a Smith form of A{X). The 
system (2) is then equivalent to 



/ Mi) 



\ 
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where y — F{d/dt)x{t) and g = E~^{d/dt)f{t). Note that E~^{X) is a matrix polynomial 
over C due to the unimodularity of E{X). This system splits into n independent scalar 
ordinary differential equations 



F^^{d/dt)y, where F^^{X) is also a matrix 



and the solution of (2) is then given by x 
polynomial over C. 

Another important application of the Smith form concerns the study of the algebraic 
structural properties of systems in linear control theory l^, 3 E^l • For example, the 
zeros and poles of a multivariable transfer function H{s) are revealed by the Smith- 
McMillan form of H{s), which is a close variant of the Smith form, but for rational (as 
opposed to polynomial) matrices. In many applications, one only needs to compute a 

261 have 



minimal basis for the kernel of a matrix polynomial. Specialized algorithms [15 
been developed for this sub-problem of the Smith form calculation. 

Smith forms of linear matrix polynomials (i.e. matrix pencils) are related to the con- 
cept of similarity of matrices. A fundamental theorem in matrix theory 0, 0] states that 
two square matrices A and B over a field K are similar if and only if their characteristic 
matrix polynomials XI ^ A and XI — B have the same Smith form D(X). Other appli- 
cations of this canonical form include finding the Frobenius form (22l . [23 | of a matrix A 
over a field by computing the invariant factors of the matrix pencil XI — A. 

Many algorithms have been developed for the computation of canonical forms of ma- 
trix polynomials in floating point arithmetic. One common approach involves finding an 
equivalent linear matrix pencil with the same finite zeros as the ori gina l matrix polyno- 
mial and a closely related Smith form [l^] ■ The Kronecker form [TqI l2Cll. of the matrix 
pencil is then computed to determine the eigenstructure of the original polynomial ma- 
trix. Another approach centers around computing the local spectral structure of a matrix 
polynomial at a single complex root, Aq, of the characteristic determinant fil. [25I . These 
methods usually boil down to computing kernels of nested Toeplitz matrices [251 [26| . 
One advantage of this local approach over the global matrix pencil approach is that only 
a few terms in an expansion of the matrix polynomial in powers of A — Ao are needed 
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to compute the spectral behavior. This can lead to a significant computational savings, 
and also allows for generalization from matrix polynomials to analytic matrix functions 
[1, [2^ . Such local canonical forms can be used to efficiently compute successive terms 
in the Laurent expansion of the inverse of an analytic matrix IlL l25| . Backward stability 
analysis of the effect of roundoff error may be found in 0, [2R 26| . A geometric approach 
to the perturbation theory of matrix pencils is discussed in [4]. 

The symbolic computation of Smith forms of matrices over Q[A] is also a widely studied 
topic. Kannan [isj gave a method for computing the Smith form with repeated trian- 
gul arizations of the matrix polynomial over Q. Kaltofen, Krishnamoorthy and Saunders 
□J gave the first polynomial time algorithm for the Smith form (without multipliers) us- 
ing the Chinese remainder theorem. A new class of probabilistic algorithms (the Monte 



Carlo algorithms) were proposed by Kaltofen, Krishnamoorthy and Saunders [111 . [12 1 . 
They showed that by multiplying the given matrix polynomial by a randomly generated 
constant matrix on the right, the Smith form with multipliers is obtained with high prob- 
ability by two steps of computation of the Hermite form. A Las Vegas algorithm given by 
Storjohann and Labahn [itI [isj significantly improved the complexity by rapidly check- 



ing the correctness of the result of the KKS algorithm. Villard [211, |23| established the 



first deterministic polynomial-time method to obtain the Smith form with multipliers 
by explicitly computing a good-conditioning matrix that replaces the random constant 
matrix in the Las Vegas algorithm. Villard also applied the method of Marlin, Labhalla 
and Lombardi |14| to obtain useful complexity bounds for the algorithm. 

We propose a new deterministic algorithm for the symbolic computation of Smith 
forms of matrix polynomials over a field in Section 3. Our approach differs from previous 
methods in that we begin by constructing local diagonal forms that we later combine 
to obtain a (global) post-multiplier. Although we do not discuss complexity bounds, we 
compare the performance of our algorithm to Villard's method with good conditioning 
in Section 4, and discuss the reasons for the increase in speed. The new algorithm is also 
easy to parallelize. In Appendix A, we present an algebraic framework that connects this 
work to [25I , and give a variant of the algorithm in which all operations are done in the 
field K rather than manipulating polynomials directly. 

As mentioned above, local canonical forms have been used successfully to study the 
structure of a matrix polynomial near a single root Aq S C of the characteristic deter- 
minant. An important point that has been neglected in the literature is that these roots 
Ao may not be expressible in radicals, or may involve such complicated expressions that 
current algorithms can only be carried out in floating point arithmetic. A major goal of 
this paper is to develop a machinery for computing local forms for all the complex roots 
of a Q-irreducible factor p(A) of the characteristic determinant simultaneously, without 
having to resort to floating point arithmetic at each root separately. This is done by 
working over the fields Q or Q + iQ rather than R or C when computing local forms. 



2. Preliminaries 

In this section, we describe the theory of Smith forms of matrix polynomials over a 
field K, which follows the definition in (3] over C. In practice, K will be Q, Q -I- iQ, 
K, or C, but it is convenient to deal with all these cases simultaneously. We also give a 
brief review of the theory of Jordan chains as well as Bezout's identity, which play an 
important role in our algorithm for computing Smith forms of matrix polynomials. 
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2.1. Smith Forms 



Suppose ^(A) = X]fc=o ^'■'^■''^'^ is an n X n matrix polynomial, where A^'^^ are n x n 
matrices whose entries are in a field K. Assmning that A{X) is regular, i.e. the determinant 
of A{X) is not identically zero, the following theorem is proved in [7[ (for K = C). 

Theorem 1. There exist matrix polynomials E{X) and F{X) over K of size n x n, with 
constant nonzero determinants, such that 

A{X) ^ E{X)D{X)F{X), i?(A)=diag[di(A),...,d„(A)], (3) 

where D{X) is a diagonal matrix with monic scalar polynomials di(X) over K such that 
di{X) is divisible by di-i{X). 

Since E{X) and F{X) have constant nonzero determinants, (3) is equivalent to 

U{X)A{X)V{X) = ^(A), (4) 
where U{X) := E{X)~^ and V{X) F{X)~^ arc also matrix polynomials over K. 

Definition 2. The representation in (3) or (4), or often D{X) alone, is called a Smith form 
of A{X). The matrices U{X), V{X) are known as multipliers. Square matrix polynomials 
with constant nonzero determinants like E(X) and F{X) are called unimodular. 

The diagonal matrix D(X) in the Smith form is unique, while the representation (3) 
is not. Suppose that 

A(A) := det[A(A)] (5) 
can be decomposed into prime elements pi(A), . . . ,pi{X) in the principal ideal domain 
K[X], that is, A(A) = cJlj^i Pi(A)''^ where c is in the field K, Pj{X) is monic and 
irreducible, and Hj are positive integers for j — 1, . . . , L Then the di{X) are given by 

I 

for some integers < kji < ■ • ■ < Hjn satisfying X]r=i '^ji ~ ^'^'^ j = ■ ■ A- 

We now define a local Smith form for A{X) at p{X). Let p{X) = Pj{X) be one of the 
irreducible factors of A(A) and define Ui = Kji, /i = Kj. Generalizing the case that 
p{X) = A — Aj, we call /i the algebraic multiplicity oip{X). 

Theorem 3. Suppose A{X) is an n x n matrix over K[X^ and p{X) is an irreducible 
factor o/ A(A). There exist nxn matrix polynomials E{X) and F{X) such that 

A{X) ^ E{X)D{X)F{X), i?(A)=diag[p(A)"\...,p(A)""], (6) 

where < ai < • • • < q;„ are non-negative integers and p{X) does not divide dct[i?(A)] 
or det[i^(A)]. 

E{X) and F{X) are not uniquely determined in a local Smith form. In particular, we 
can impose the additional requirement that F{X) be unimodular by absorbing the missing 
parts of D{X) in Theorem 1 into E{X). Then the local Smith form of A{X) at p{X) is given 
by 

A{X)V{X) ^ E{X)D{X), (7) 
where V{X) :— F{X)~^ is a matrix polynomial. 
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2.2. Multiplication and division in R/pR 



We define R = K[X] and M = R" . Note tliat i? is a principal ideal domain and M 
is a free i?-module of ranlc n. Suppose p is a prime element in R. Since p is irreducible, 
R/pR is a field and M/pM is a vector space over this field. 

Multiplication and division in R/pR are easily carried out using the companion matrix 
of p. If we set s := degp and define 7 : R/pR by 



7(a;)(A) = x 



(0) 



A" 



-i^(.-i) 



-pR, 



r(O). 



(s-l) 



) G K' 



we can pull back the field structure of R/pR to to obtain 



xy = ^{x){S)y 
and x/y = [y, Sy, 



[x^°h + 

= [y,Sy, . . 

. , S''~^y]~^x, where 

/o 

1 



x^'^S+--- 
■,S'-'y]x 



[x, Sx, . . . , S'^~^x]y 



(8) 



(9) 



S 



Vo 



.. 

■. 
1 



-0-3-2 



(10) 



is the companion matrix of p(A) = oq + aiA + • • • + Os-iA"^^ + A*. Note that S represents 
multiplication by A in R/pR. The matrix [y, Sy, . . . , 5""^?/] is invertible when y 7^ since 
a non-trivial vector x in its kernel would lead to non-zero polynomials 7(x), 7(2/) G R/pR 
whose product is zero (mod p), which is impossible as p is irreducible. 



2.3. .Jordan Chains 



Finding a local Smith form of a matrix polynomial over C at p(A) = A — Ao is equivalent 
to finding a canonical system of Jordan chains [l,[25j for A{X) at Aq. We now generalize 
the notion of Jordan chain to the case of an irreducible polynomial over a field K. 

Definition 4. Suppose A{\) is an n x n matrix polynomial over a field /v , p(A) is 
irreducible in K[X], and a > 1 is an integer. A vector polynomial a:;(A) E M = A'[A]" is 
called a root function of order a for A(X) at p{X) if 

A{X)x{X) = 0{p{Xr) (11) 

and p{X) \ x{X). The meaning of (11) is that each component of j4(A)a;(A) is divisible by 
p(A)". If the root function x{X) has the form 

x{X) = (A) + j3(A)x(i) (A) + ■ • • + p(A)"-ix("-i) (A) (12) 

with dega;*^''-'(A) < s := degp(A), the coefficients x'^'^'(A) are said to form a Jordan chain 
of length a for ^(A) at p{X). A root function can always be converted to the form (12) 
by truncating or zero-padding its expansion in powers of p{X). If K can be embedded in 
C, (11) implies that over C, x{X) is a root function of ^(A) of order a at each root Xj of 
p(A) simultaneously. 



5 



Definition 5. Several vector polynomials {xj{X)}'^^i form a system of root functions at 
p{X) if 

1. A{X)x,{X) = O(p(A)"0, («j > 1, 1 < i < 

2. The set {xj{X)}j^i is linearly independent in M/pM over R/pR, ^^'^^ 

where i? = A' [A], A/ = i?", = x-, + pM . 

It is called canonical if (1) = dimker A, where A is the linear operator on AI/pM in- 
duced by A{X); (2) xi(A) is a root function of maximal order ai; and (3) for i > 1, Xi{X) 
has maximal order ai among all root functions .t(A) £ M such that x is linearly inde- 
pendent of ii, . . . in M/pM. The integers ai > ■ ■ ■ > a^, are uniquely determined 
by A{X). We call the geometric multiplicity of p(A). 

Definition 6. An extended system of root functions a;i(A),. . . ,a;„(A) is a collection of 
vector polynomials satisfying (13) with ly replaced by n and aj allowed to be zero. The 
extended system is said to be canonical if, as before, the orders aj are chosen to be 
maximal among root functions not in the span of previous root functions in M/pM. 
The resulting sequence of numbers ai > ■ ■ ■ > a^ > a,y+i = • ■ • = a„ = is uniquely 
determined by A{X). 

Given such a system (not necessarily canonical), we define the matrices 

1/(A) = [xi(A),...,x„(A)], (14) 
D(A)=diag[p(A)"\...,p(A)""], (15) 
E{X) ^ A{X)V{X)D{X)-\ (16) 

E{X) is a polynomial since column j of A{X)V{X) is divisible by p{X)°'^ . The following 
theorem shows that aside from a reversal of the convention for ordering the aj , finding a 
local Smith form is equivalent to finding an extended canonical system of root functions: 

Theorem 7. The following three conditions are equivalent: 

(1) the columns Xj (A) ofV{X) form an extended canonical system of root functions 
for A{X) at p{X) (up to a permutation of columns). 

(2) p{X)\det[E{X)]. 

(3) '^j ~ where fi is the algebraic multiplicity of p{X) in A(A). 

This theorem is proved e.g. in Q for the case that K = C. The proof over a general 
field K is identical, except that the following lemma is used in place of invertibility of 
E{Xq). This lemma also plays a fundamental role in our construction of Jordan chains 
and local Smith forms. 

Lemma 8. Suppose K is a field, p is an irreducible polynomial in R — K[X], and 
E = [yi, . . . , y„] is an n X n matrix with columns yj £ M ~ R" . Then p \ det{E) <^ 
{yi, . . . ,2/n} are linearly independent in M/pM over R/pR. 

Proof. The ijj are linearly independent iff the determinant of E (considered as an n x 
matrix with entries in the field R/pR) is non-zero. But 

det i; = det £;+ pi?, (17) 

where det E is computed over R. The result follows. □ 
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2.^. Bezout's Identity 

As K[X\ is a principal ideal domain, Bezout's Identity holds, which is our main tool 
for combining local Smith forms into a single global Smith form. We define the notation 
gcd(/i, . . . , /i) to be if each fj is zero, and the monic greatest common divisor (GCD) 
of fi, . . . ,fi over K[X], otherwise. 

Theorem 9. (Bezout's Identity) For any two polynomials fi and f2 in K[X], where K 
is a field, there exist polynomials gi and g2 in i^[A] such that 

gi/l+.92./2=gcd(/i,/2). (18) 

Bezout's Identity can be extended to combinations of more than two polynomials: 

Theorem 10. (Generalized Bezout's Identity) For any scalar polynomials /i, . . . , /; in 
K[\], there exist polynomials gi, . . . ,gi in K[X\ such that 



f 3 = gcd(./i,...,//). 
The polynomials gj are called the Bezout coefficients o/{/i, , 



,fi}- 



In particular, suppose we have I distinct prime elements {pi, . . . ,pi} in if [A], and 



fj is given by fj = Uk^jPk^ where /3i, 

notation Ofe^i^ i indicates a product over all indices k ~ 1, . . . ,1 except k 
gcd (/i, . . . , fi) = 1, and we can find gi, . . . , gi in K[X] such that 



Pi are given positive integers and the 

j. Then 



(19) 



In this case, the polynomials gj are uniquely determined by requiring deg{gj) < Sj/3j, 
where Sj = deg(pj). The formula (19) modulo pk shows that gk is not divisible by pk- 

The Bezout coefficients are easily computed using the extended Euclidean algorithm 
0. In practice, we use MatrixPolynomialAlgebra[HermiteForm] in Maple to find a unimod- 
ular matrix Q such that 



Q 



h 



(20) 



where r = gcd(/i, . . . , fi) ~ 1. The first row of Q is [gi, . . . , gi]. One could avoid com- 
puting the remaining rows of Q by storing the sequence of elementary unimodular oper- 
ations required to reduce [/i; . . . ; /;] to [r; 0; . . . ; 0] and applying them to the row vector 
[1, 0, . . . , 0] from the right to obtain [gi, . . . ,gi]. 



3. An Algorithm for Computing a (global) Smith Form 

In this section, we describe an algorithm for computing a Smith form of a regular 
nxn matrix polynomial A{X) over a field K. We have in mind the case where K ~ C,M., 
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Q or Q + C C, but the construction works for any field. The basic procedure follows 
several steps, which will be explained further below: 

• Step 0. Compute A(A) = det[A(A)] and decompose it into irreducible monic factors in 
K[X], 

A(A) = const -pi {X)"' ...pi [Xf . (21) 

• Step 1. Compute a local Smith form 

A{X)V,{X) - i?,(A)diag [p,{XT^\. . . ,P,(A)«-] (22) 

for each factor Pj{X) of A(A). 

• Step 2. Find a linear combination Bn{X) = Yl]=i djWfjW^jW using the Bczout 

coefficients of /j(A) ~ Yi'^k^j Pk{X)'^^'^ so that the columns of Bn{X) form an extended 
canonical system of root functions for A{X) with respect to each Pj{X). 

• Step 3. Eliminate extraneous zeros from det [j4(A)-B„(A)] by finding a unimodular 
matrix ^(A) such that Bi{X) = V {X)^^ Bn{X) is lower triangular. We will show that 
^(A)y(A) is then of the form £:(A)i:'(A) with £:(A) unimodular and ^(A) as in (3). 

Remark 1 1 . Once the local Smith forms arc known, the diagonal entries of the matrix 
polynomial D(X) are given by 

d^{X)^\{p,{Xr^^, i = l,...,n. 

This allows us to order the columns once and for all in Step 2. 
3.1. A Local Smith Form Algorithm (Step 1) 

In this section, we show how to generalize the construction in (25l | for finding a canon- 
ical system of Jordan chains for an analytic matrix function A{X) over C at Aq = to 
finding a local Smith form for a matrix polynomial A{X) with respect to an irreducible 
factor p{X) of A(A) = dct[^(A)]. The new algorithm reduces to the "exact arithmetic" 
version of the previous algorithm when p{X) = X. In Appendix A, we present a variant 
of the algorithm that is easier to implement than the current approach, and is closer in 
spirit to the construction in [H], but is less efficient by a factor of s = dcgp. 

Our goal is to find matrices V[X) and E{X) such that p(A) does not divide det[y(A)] 
or det[i?(A)], and such that 

A{X)V{X)^E{X)D{X), 2?(A)=diag[p(A)"S...,p(A)""], (23) 

where < ai < • • • < q;,i. In our construction, V{X) will be unimodular, which reduces 
the work in Step 3 of the high level algorithm, the step in which extraneous zeros are 
removed from the determinant of the combined local Smith forms. 

We start with ^(A) = /nxn and perform a sequence of column operations on V{X) 
that preserve its determinant (up to a sign) and systematically increase the orders ai in 
D{X) in (23) until det[£^(A)] no longer contains a factor of p{X). This can be considered 
a "breadth first" construction of a canonical system of Jordan chains, in contrast to the 
"depth first" procedure described in Definition 5 above. 

The basic algorithm is presented in Figure 1. The idea is to run through the columns 
of V in turn and "accept" columns whenever the leading term of the residual A{X)xi{X) 
is linearly independent of its predecessors; otherwise we find a linear combination of 
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Algorithm 1. (Local Smith form, preliminary version) 

fc = 0, i = l, y = [xi, . . . ,a;„] = /„xn 
while i < n 

Tk-i = n + 1 — i Tk-i '■= dim. of space of J. chains of length > k 

for j = l,...,rfc_i 

Hi = Tem{qiio{Axi, p'') , p) define yi so Axi = p'^yi + 0(p''+^) 

if the set {yi, . . . , y^} is linearly independent in M jpM over R/pR 

Ui = k, i = i + 1 accept Xi and yi, define at 

else 

find hi, . . . , dj_i G R/pR so that yi - o-mym = 

(new) (o/rf) Y^i— 1 k~arn 

•^tmp — -^i^ -^771 — -^m+li — i, ... ,71 1), Xji — ^tmp 

end if 
end for j 
k = k + l 
end while 

l3 = k — I, rp = P := Oin ~ maximal Jordan chain length 



Fig. 1. Algorithm for computing a local Smith form. Here quo(-, •) and rem(-, ■) are the quotient 
and remainder of polynomials: g = quo(/, p), r = rem(/, p) -i^ f = gp + r, degr < degp. 

previously accepted columns to cancel this leading term and cyclically rotate the column 
to the end for further processing. Note that for each k, we cycle through each unaccepted 
column exactly once: after rotating a column to the end, it will not become active again 
until k has increased by one. At the start of the while loop, we have the invariants 

(1) AXfn is divisible by p'^, {i < ni < n). 

(2) ^x™ =p"'"y™ + 0(p"™+i), {l<m<i). 

(3) if i > 2 then {ymYmLi is linearly independent in M/pM over R/pR. 

The third property is guaranteed by the if statement, and the second property follows 
from the first due to the definition of a; and yi in the algorithm. The first property is 
obviously true when fc = 0; it continues to hold each time k is incremented due to step 
after which Ax["'™^ is divisible by p^~^^: 

i-1 i-1 



M"'"^ _ ^ p'^-^'-a„Ax,n=p''y^ + 0{p^+^) - P'""'"am(p""ym + 0{p^-^+^)) 

m— 1 m— 1 

m— 1 

This equation is independent of which polynomials G R are chosen to represent 
dm e R/pR, but different choices will lead to different (equally valid) Smith forms; in 
practice, we choose the unique representatives such that degOm < s, where 

s = degp. (24) 

This choice of the a,„ leads to two additional invariants of the while loop, namely 
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(4) degXm < niax(sam — 1, 0), (1 < to < i), 

(5) deg < niax(sfc — 1, 0), {i<'m<n), 
which are easily proved inductively by noting that 

dcg(/-"'"a™x™) < s{k - a™) + (s - 1) + dcg(a;„) < s{k + 1) - 1. (25) 

The while loop eventually terminates, for at the end of each loop (after k has been 
incremented) we have produced a unimodular matrix V{\) such that 

A{X)V{\)^E{\)D{\), i? = diag[KS-..,p"'-S/^_^]. (26) 

Tk^i times 

Hence, the algorithm must terminate before k exceeds the algebraic multiplicity /i of p{\) 
in A(A): 

fc< (E™ii«0+(" + l-*)fc<M, A(A) = /(A)p(A)^ p\f. (27) 

In fact, we can avoid the last iteration of the while loop if we change the test to 

while [( I]mii oii) + {n+\- i)k] < i^l 
and change the last line to 

P = k, Um = k, [i < m < n), "T/j-i = n + 1 — i, = 0. 

We know the remaining columns of V will be accepted without having to compute the 
remaining yi or check them for linear independence. When the algorithm terminates, we 
will have found a unimodular matrix ^(A) satisfying (23) such that the columns of 

i^(A) = [2)i(A),...,y„(A)] 

are linearly independent in M/pM over R/pR. By Lemma 8, p{X) | det[i?(A)], as required. 

To implement the algorithm, we must find an efficient way to compute yi, test for 
linear independence in M/pM , find the coefficients am to cancel the leading term of the 



residual, and update Xi . Motivated by the construction in [2q , we interpret the loop over 
j in Algorithm 1 as a single nullspace calculation. 

To this end, we define Ri = {a € R : dega < 1} and Mi = i?", both viewed as vector 
spaces over K. Then we have an isomorphism A of vector spaces over K 

A(x(°);...;x('=-i)) = x(°)+p.t(i) + ...+/-1x('=-i). ^ ^ 

At times it will be convenient to identify Ris with R/p^R and Mi^ with M/p^M to obtain 
ring and module structures for these spaces. We also expand 

A = A^°^ + pA^^^ + ■ ■ ■ + p" A^'i\ (29) 

where A^^^ is an ?i x n matrix with entries in Rg. 

By invariants (4) and (5) of the while loop in Algorithm 1, we may write Xi = 
A(a;|''^; . . . ; x^°'^) with a = max(fc — 1,0). Since Axi is divisible by p^ , we have 

k k-l 

yi = rem(quo(yla;i,p''),p) = ^rcu\{A'^^^^^ x'f\p) + ^qa.o{A'^^''^~''^ x^^\p). (30) 

3=0 3=0 
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The matrix- vector multiplications A^''~^'>x^''^ are done in the ring R (leading to vector 
polynomials of degree < 2s — 2) before the quotient and remainder are taken. When 
k = 0, the second sum should be omitted, and when A: > 1, the j = k term in the first 
sum can be dropped since x^^^ = in the algorithm. It is convenient to write (30) in 
matrix form. If A: = we have 

[yi,...,y„]=A("). (31) 
If fc > 1, suppose we have already computed the nk x r^^i matrix Xf.^i with columns 

Xfe_i(:,m + l-i) = (40);...;x(„^-i)), i<m<n. (32) 

Note that K{Xk-i) (acting column by column) contains the last Vk-i columns of V{\) 
at the start of the while loop in Algorithm 1. Then by (30), 

[y„ . . . ,2/„] = rem([AW, . . . , A^^'^]Xu-up) + quo([A(^-i\ . . . , AW]Xfc_i,p). (33) 

As before, the matrix multiplications are done in the ring R before the quotient and 
remainder are computed. The components of each belong to Rs . 
Next we define the auxiliary matrices 

A = < r n (34) 

\[A-i , [y^,...,yn]], l<fc</3-l. 

and compute the reduced row-echelon form of Ak using Gauss- Jordan elimination over 
the field R/pR. The reduced row-echelon form of Ak can be interpreted as a tableau 
telling which columns of Ak are linearly independent of their predecessors (the accepted 
columns), and also giving the linear combination of previously accepted columns that 
will annihilate a linearly dependent column. On the first iteration (with fc = 0), step * 
in Algorithm 1 will build up the matrix 

Xo=null(io), (35) 

where null(-) is the standard algorithm for computing a basis for the nullspace of a 
matrix from the reduced row-echelon form (followed by a truncation to replace elements 
in R/pR with their representatives in Rs). But rather than rotating these columns to 
the end as in Algorithm 1, we now append the corresponding yi to the end of Ak-i to 
form Ak for fc > 1. The "dead" columns left behind (not accepted, not active) serve only 
as placeholders, causing the resulting matrices Ak to be nested. We use rref(-) to denote 
the reduced row-echelon form of a matrix polynomial. The leading columns of rref(^fc) 
will then coincide with rref(„4fc-i), and the nullspace matrices will also be nested. We 
denote the new columns of null(A) beyond those of mi\\{Ak-i) by [Yfe; Uk]'- 



nun(A)- (36) 



Xo Yi ••■ Yk^i Yk 
[;7i;0] ••• [C/fe-i;0] Uk, 
Note that Ak is n x (n + Rk-i), where 

R-i^O, Rk^ro + --- + rk=dimkcTAk, (fc > 0). (37) 
We also see that Xq is n x rg, Yk is n x rk, Uk is I'k-i x rk, and 

rk<rk-u (fc>0). (38) 
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This inequality is due to the fact that the dimension of the kernel cannot increase by 
more than the number of columns added. 

If column i of Ak is linearly dependent on its predecessors, the coefficients a,„ used 
in step * of Algorithm 1 are precisely the (truncations of the) coefficients that appear in 
column i of rref(^fc). As shown in Figure 2, the corresponding null vector (i.e. column 
of [Yfe; Uk]) contains the negatives of these coefficients in the rows corresponding to the 
previously accepted columns of Ak, followed by a 1 in row i. Thus, in step *, if fc > 1 
and we write Xm ~ A(a;m^; ■ • ■ ; a;™'') with a ~ max(Q!m — 1,0), the update 



X,: — X^ ^ P CLm.Xrr)., (^m.Xjri. — Afz^ \ . . . , 



1 

^ ^ P ^m-^m; CLfyiX^n, — A( 

m— 1 

rem(am2;m\p), j = 0, 

'^■'^ = \ T:em.{amXm ,p) + qyio{arnXm~^\p), I < i <ar. 

qno{amXm~^\p), j = am and a™ > 0, 



(39) 



is equivalent to 

Xk = iHx^i)Yk + rem ([t^-V(^o) , • ■ • , tV(^fe-i)] Uk , p) 

+ quo([t'=(Xo), ... , LHXk-i)]Uk,p), 

where l, p : {Msf — > {MsY'^^ act column by column, padding them with zeros: 

l{x) = [0; x], p{x) = [x; 0], x e (A/,)', G M,. (40) 

Here AtA~^ is multiplication by p, which embeds Mis — M/p^M in M(;+i)5 ~ M/p^+'^M 
as a module over R, while p is an embedding of vector spaces over K (but not an R- 
module morphism). If we define the matrices Xg = and 




Xi 



[k > 1), (41) 



then (39) simply becomes 

Xk = [rem(Xfc_iC/fe,p); Ffc] + quo(/,(Xfc_i[/fc),p). (42) 

As in (33) above, the matrix multiplications arc done in the ring R before the quotient 
and remainder are computed to obtain Xk- Finally, we line up the columns of Xk-i with 
the last rk-i columns of Ak and extract (i.e. accept) columns of Xk-i that correspond 
to new, linearly independent columns of Ak- We denote the matrix of extracted columns 
by Xk-i- At the completion of the algorithm, the unimodular matrix V{X) that puts 
A{X) in local Smith form is given by 

F(A) = [A(X_i),---,A(X^_i)]. (43) 

The final algorithm is presented in Figure 3. In the step marked •, we can avoid re- 
computing the reduced row-echelon form of the first n + Rk-2 columns of Ak by storing 
the sequence of Gauss- Jordan transformations Q that reduced Ak-i to row-echelon form. 
To compute \Yk ; Uk] , we need only apply these transformations to the new columns of 
Ak and then proceed with the row-reduction algorithm on these final columns. Also, if 
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Fig. 2. The reduced row-echelon form of Afi contains all the information necessary to construct 
V^(A) = [A(X_i), . . . , A(Xs-i)]. An arrow from a column [v;u\ of [Yfe; indicates that the 
vector ([rem(Xfc_i?i,p); v] + quo(t(Xfc_iM),p)) is a column of Xk- 



Algorithm 2. (Local Smith form, final version) 


fc = 0, =0, A = 




Xo = Xo = null(A) 




To = -Ro = mim_cols(Xo) 


(number of columns) 




(columns ji of rref (A) start new rows) 


while < /i 


(/i = algebraic multiplicity of p) 


fc = fc+ 1 




A = [A-i, rem([AW,.. 


, A(l)]Xfe_i,p) + quo([A'=-l), . . . , A(0)]Xfc_i,p)] 


• [Yfe ; Uk] ~ new columns of 


null(A) beyond those of null(^fe_i) 


Tfe = num_cols([/fe), 


{Uk is Rk-i X Tk) 


= Rk-i+ru 




Xk = [rcm(Xfc_iC/fe,p); Yfc] + quo(t(Xfc_i[/fc),_p) (X^ is n(fc + 1) x r^) 


Xfc = [i(Xfc_i),Xfe] 


(Xfc is n(fc + 1) X Rk) 




i-r J), (columns n + Rk-2 + ji of 


end while 


rref(A) start new rows) 


^ = fc+ 1 


(maximal Jordan chain length) 






V{\) = [A(X_i),...,A(X^_i)] 





Fig. 3. Algorithm for computing a unimodular local Smith form. 



is large and sparse, rather than reducing to row-echelon form, one could find kernels 
using an LU factorization designed to handle singular matrices. This would allow the use 
of graph theory (clique analysis) to choose pivots in the Gaussian elimination procedure 
to minimize fill-in. We also note that if A(A) contains only one irreducible factor, the 
local Smith form is a (global) Smith form of A{X). 
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3.2. From Local to Global (Step 2) 

Now that we have a local Smith form (22) for every irreducible factor pj (A) of A(A), we 
can use the extended Euclidean algorithm to obtain a family of polynomials {(7j(A)}'^]^ 
with deg((7j(A)) < SjKjn, where Sj = deg{pj), such that 



E n p^i^y 

j=l L k=l,k^j 



1, 



(44) 



where pj{X)'^^" is the last entry in the diagonal matrix of the local Smith form at Pj{X). 
The integers Kj„ are positive. We define a matrix polynomial B„{X) via 



I p I 
The main result of this section is stated as follows. 



(45) 



Proposition 12. The matrix polynomial Bn{X) in (45) has two key properties: 

(1) Let bni{X) be the ith column of Bn{X). Then A{X)bni{X) is divisible by di{X), where 
di{X) = Yi'^j^iPjW^'^ ^'^s diagonal entry in D{X) of the Smith form. 

(2) det[i3„(A)] is not divisible by Pj{X) for j = 1, . . . , L 

Proof. (1) Let Uji(A) be the ith column of V,(A). Then A{X)vji{X) is divisible by pj(A)''^' 
and 

I r I 



gj{X)vji{X). 



Since Kj„ > Kjt for 1 < i <n and I < j < I, A{X)bni{X) is divisible by di{X). 

(2) The local Smith form construction ensures that Pj{X) \ det[V,(A)] for each 1 < j < 
I. Equation (44) modulo Pj{X) shows that Pj{X) \ gj{X). By definition, 

det[B„(A)] = dct ([5„i(A) , . . . , &„„(A)]) = dct {[bm{X)]"^^) 
= det ([ E ( n Pk{Xr''Agr{X)v,,,{X) 
Each term in the sum is divisible by Pj{X) except j' = j. Thus, by multi- linearity, 



rem(dct[S„(A)],Pj(A)) 



as claimed. □ 



X{pk{xr 

k^J 



[g,(A)]"det [V,iX)],p,iX)] ^ 0, 



Remark 13. It is possible for det[i3„(A)] to be non-constant; however, its irreducible 
factors will be distinct from pi{X), . . . ,pi{X). 

Remark 14. Rather than building Bn{X) as a linear combination (45), we may form 
B„{X) with columns 



9ijiX)vji{X), (1 < i < n), 



14 



where {gijYj^i solves the extended GCD problem 

I p I 

j=i L k^j 

The two properties proved above also hold for this definition of B„{X). This modification 
can significantly reduce the size of the coefficients in the computation when there is a 
wide range of Jordan chain lengths. But if Kji only changes slightly for 1 < i < n, this 
change will not significantly affect the total running time of the algorithm. 

3.3. Construction of Unimodular Multipliers (Step 3) 

Given [/i(A); . . . ; /„(A)] S isr[A]", we can compute the Hermite form (20) to obtain 
a unimodular matrix Q{\) such that, after reversing rows, Q(A)/(A) = [0; . . . ; 0; r(A)], 
where r = gcd(/i, . . . ,/«)• We apply this procedure to the last column of Bn{\) and 
define Vn{\) = Q{X)~^. The resulting matrix 

B„_i(A) :=K(A)-iB„(A) 

is zero above the main diagonal in column n. We then apply this procedure to the first 
n — 1 components of column n — 1 of _B„_i(A) to get a new Q(A), and define 



K-i(A) 



Q(A)- 



\0 ••• 



0^ 





TJ 



(46) 



It follows that i?„_2(A) Vn~i{X)^^ Bn~i{X) is zero above the main diagonal in columns 
71 — 1 and 71. Continuing in this fashion, we obtain unimodular matrices Vn{X), . . . , V2(A) 
such that 

A(A)S„(A) = A(A) K(A) • • • V2{X) V2{X)-^ ■ ■ ■ K(A)-iB„(A) = A(A)F(A)Bi(A), 

V(A) -B„-i(A) 

where V^(A) is unimodular, Si (A) is lower triangular, and 

det[Bi(A)] = const • det[S„(A)]. (47) 
The matrix V{X) puts A{X) in Smith form: 

Proposition 15. There is a unimodular matrix polynomial E{X) such that 

A{X)V{X) ^ E{X)D{X), (48) 

where D{X) is of the form (3). 

Proof. Let r,„i(A) denote the entry of Bi{X) in the Tilth row and ith column. Define yi{X) 
and Zi{X) to be the ith columns of A{X)V{X) and A{X)V {X)Bi{X)^ respectively, so that 

n 

z,(A) =7/,(A)r,,(A)+ 2/™(A)?-m^(A), il<i<n). (49) 

m— 2+1 



15 



By Proposition 12, Zi{X) is divisible by di{X) for 1 < i < n and Pj{X) \ det[-Bi(A)] for 
1 < j < L It follows that the diagonal entries rii{X) of i?i(A) are relatively prime to each 
of the c?i(A). As (i„(A) divides j/n(A)r„„(A) and is relatively prime to r„„(A), it divides 
yn{X) alone. Now suppose 1 < i < n and we have shown that dm (A) divides ym{X) for 
i < m < n. Then since di{X) divides dm (A) for m > i and rniX) is relatively prime to 
di(A), we conclude from (49) that di[X) divides yi{X). By induction, di{X) divides yi{X) for 
1 < i < n. Thus, there is a matrix polynomial E{X) such that (48) holds. Because 1^(A) 
is unimodular and det[^(A)] = const ■ det[D(A)], it follows that E{X) is also unimodular, 
as claimed. □ 

Remark 16. V{X) constructed as described above puts j4(A) in a global Smith form 
whether we build Bn{X) as a linear combination (45) or as in Remark 14. 

Remark 17. We can stop before reaching V2(A) by adding a test 

while dk I 

to the loop in which V{X) is constructed. When the loop terminates, we have V(X) = 
Vn{X) ■ ■ ■ Vfe+i(A), where k is the largest integer for which 

di(A) = --- = 4(A) = l. 

Note that k is known from the local Smith form calculations. The last n — k columns of 
Vn{X) ■ ■ ■ Vfc+i(A) are the same as those of y„(A) • • • V2(A); therefore, either can be used 
for V{X) as they contain identical Jordan chains. 

Remark 18. A slight modification of this procedure can significantly reduce the degree 
of the polynomials and the size of the coefficients in the computation. In this variant, 
rather than applying the extended GCD algorithm on 6n„(A) to find a unimodular matrix 
polynomial Q{X) so that Q(A)6„„(A) has the form [0; . . . ; 0; r(A)], we compute Q{X) that 
puts rem(6„„(A), d„(A)) into this form. That is, we replace the last column of Bn{X) with 
rem(6„„(A), d„(A)) before computing Q{X). To distinguish, we denote this new definition 
of Vn{X) ~ Q{X)~^ by Vn{X) and the resulting i3„_i(A) by i3„_i(A). Continuing in 
this manner, we find unimodular matrix polynomials V^(A), . . . , Vk+i{X) by applying the 
procedure on rem(6ii(A), c?i(A)) for i = n,...,fc+ 1, where bii{X) contains the first i 
components of column i of Bi{X) and k is defined as in Remark 17. We also define 

B^ = V,+iiXy' ■ ■ ■ K(A)-iB„(A), {k<t<n- 1). 

Note that in general, Bi{X) ^ Bi{X). It remains to show that this definition of V(X) = 
Vn{X) . . . Vfe+i(A), which satisfies 

A{X)B^iX) = A{X) K(A) • • • Vk+i{X) Vk+iW-' ■ ■ ■ K(A)-iS„(A) = A(A)y(A)-Bfc(A), 

V{\} B„-i(A) 

also puts A{X) in Smith form: 

Proposition 19. There is a unimodular matrix polynomial E{X) such that 

A{X)V{X) ^ E{X)D{X), (50) 

where D{X) is of the form (3). 
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Proof. Define ^^(A) = qao{bii{X),di{X));0 



e M ^ R"- for i = n, 



, fc + 1 , where 



e i?" *, bii{\) was defined above, and i?„(A) := Bn{X)- Tfien we liave 



B„_i(A) = y„(A)-MB„(A)- 



Tix (ri— 1) 



rfn(A)g„(A) 



= B„_i(A) - 



Onx(n-l) 



d„(A)K(A)-ig„(A) 



Tlie first n — 1 columns of -B„(A) are tlie same as those of i?„(A). Continuing, we have 



S„_2(A) = K-i(A)-^ (s„_i(A) 
= K_i(A)-i (b„_i(A)- [o 



nx (n— 2) 



(A)<7„-i(A) 



0„ 



nx (n — 2) 



d„_i(A)g„_i(A) d„(A)K(A)-ig„(A) 



nx (n— 2) 



= B„_2(A)- 
It follows by induction that 
S,(A) = yfe+i(A)-i fSfc+i(A) 



L(A)K-i(A)-ig„-i(A) d„(A)K-i(A)-V„(A)-ig„(A) 



0„ 



xfc 



rffc+i(A)gfe+i(A) 



0. 



X (n — fc— 1) 



(51) 



^,(A) 



0. 



xfc 



rf«(A)14+i(A)-i---K(A)-ig„(A) 



rffc+i(A)14+i(A)-igfe+i(A) 

-Bfc(A) is zero above the main diagonal in columns + 1 to n. Define 

w.(A) := Vfe+i(A)-i-.-y,(A)-ig,(A), (fc + 1 < i < n). 

Then the ith column of the difference -Bfc(A) — Bk{X) is (ij(A)iti(A) for fc + 1 < z < rt. 

Let fmi(A) denote the entry of -Bfc(A) in the mth row and ith column. Define yi(A) 
and Zi(A) to be the ith columns of A{X)V{X) and A{X)V{X)Bk{X), respectively, so that 



^.(A) 



yi{X)ni{X)+ ^ y7n{X)f,ni{X) 

m— z+l 



+ d,(A)A(A)y(A)ui(A), {k + l<i<n). 



By Proposition 12, Zi(A) is divisible by di{X) andpj(A) f det[i3„(A)] = const-det[i3i_i(A)] 
for k + 1 < i < n and 1 < j < I. As di divides dm ior i < m < n and since (51) holds with 
k replaced by i — 1 for A:+ 1 < i < n, det[i?i_i(A)] — det[i?i_i(A)] is divisible by c?i(A) due 
to multi-linearity of determinants. We also know that det[i3i_i(A)] is divisible by fii{X). 
Proof by contradiction shows that fu^X) is relatively prime to di{X) for fc + 1 < i < n. 
Then we argue by induction as in the proof of Proposition 15 to conclude that di{X) 
divides yi{X) for fc + 1 < i < n. It holds trivially for 1 < i < fc as di = • • • = = 1. 
Thus, there is a matrix polynomial E{X) such that (50) holds. Because V{X) is unimodular 
and dct[yl(A)] = const • dct[I?(A)], it follows that E(X) is also unimodular. □ 



4. Performance Comparison 

In this section, we compare our algorithm to Villard's method with good conditioning 
[2^ . which is another deterministic sequential method for computing Smith forms with 
multipliers, and to 'MatrixPolynomialAlgebra[SmithForm]' in Maple. All three algorithms 
arc implemented in exact arithmetic using Maple 13. The maximum number of digits 
that Maple can use for the numerator and denominator of a rational number (given by 
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'kernelopts(maxdigits)') is over 38 billion. However, limitations of available memory and 
running time set the limit on the largest integer number much lower than this. We use 
the variant of Algorithm 1 given in Appendix A to compute local Smith forms. 

To evaluate the performance of these methods, we generate several groups of diagonal 
matrices D{X) over Q and multiply them on each side by unimodular matrices of the 
form L(A)Z(A), where L(X) is unit lower triangular and Z[X) is unit upper triangular, 
both with off diagonal entries of the form X — i with i G {—10, . . . , 10} a random integer. 
As a final step, we apply a row or column permutation to the resulting matrix. We find 
that row permutation has little effect on the running time of the algorithms while column 
permutation reduces the performance of Villard's method. We compare the results in two 
extreme cases: (1) without column permutation and (2) with columns reversed. Each 
process is repeated five times for each D{X) and the median running time is recorded. 

We use several parameters in the comparison, including the size n of the square matrix 
A{X), the bound d of the polynomial degrees of the entries in A{X), the number I of 
irreducible factors in det[^(A)], and the maximal Jordan chain length Kj„. 

In Figure 4, we show the running time of three tests with linear irreducible factors 
of the form pj = X — Xj. As Villard's method and Maple compute the left and right 
multipliers U{X) and V^(A) while our algorithm instead computes E{X) and V^(A), we 
also report the cost of inverting E{X) to obtain U{X) at the end of our algorithm (using 
Maple's matrix inverse routine). This step could be made significantly faster by taking 
advantage of the fact that E{X) is unimodular. For example, one could store the sequence 
of elementary unimodular operations such that T{X) = (5m(A) • • ■ (5i(A)£'(A) is unit 
upper triangular. It would not be necessary to actually form the matrices T[X)~^ or 

[/(A) =T(A)-iQ™(A)---Qi(A) (52) 

as the right hand side can be applied directly to any vector polynomial using back 
substitution to solve T(A)a;(A) = z{X) in the last step. The same idea is standard in 
numerical linear algebra, where the L?7-decomposition of a matrix is less expensive to 
compute than its inverse, and is equally useful. In the first test of Figure 4, D„{X) is of 
the form 

D„{X) = diag[l, . . . , 1, A, A(A - 1), X'iX - l),X\X - l)% 
where the matrix size n increases, starting with n = 4. Hence, we have d = 8, I ^ 2, and 
Kin = 'v2n = 2 all fixed. (The unimodular matrices in the construction of A{X) each have 
degree 2.) For this test, inverting E{X) to obtain U{X) is the most expensive step of our 
algorithm. Without column permutation of the test matrices, our algorithm (with U{X)) 
and Villard's method have similar running times, both outperforming Maple's built-in 
function. With column permutation, the performance of Villard's method drops to the 
level of Maple's routine while our algorithm remains faster. For the second test, we use 
test matrices Di{X) of size 9x9, where / is the number of roots of det[A(A)]: 

Di{X) ^ diag[l, . . . , 1, 1[{X - j)], (1^1,2,...). 

i=i 

Thus, n = 9, d = Z + 4 and Hjn = 1 for 1 < j < I. This time the relative cost of inverting 
E{X) to obtain U{X) decreases with / in our algorithm, which is significantly faster than 
the other two methods whether or not we permute columns in the test matrices. In the 
third test, we use 9x9 test matrices Dk{X) of the form 

Dk{X) = diag[l, . . . , 1, (A - 1)'=], (fc = 1, 2, . . . ), 
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matrix size n number of roots I of det(A(A,)) maximal J. chain iength k 



Fig. 4. Comparison of running time of our algorithm (with or without computing (7(A)) to 
Villard's method, and to Maple's Smith form routine, on three families of test matrices. (Top 
row) without column permutation of test matrices. (Bottom row) with column permutation. 

with n = 9, I = 1, Kin = k and d = k + 4. We did not implement the re-use strategy 
for computing the reduced row-echelon form of Ak by storing the Gauss- Jordan trans- 
formations used to obtain rref(^fc_i), and then continuing with only the new columns 
of Ak- This is because the built-in function LinearAlgebra[ReducedRowEchelonForm] is 
much faster than can be achieved by a user defined Maple code for the same purpose. 
In a lower level language (or with access to Maple's internal code), this re-use strategy 
would decrease the running time of local Smith form calculations in this test from O(fc^) 
to 0{k^). A similar decrease in the cost of computing the left-multiplier U{X) ~ E{X)~^ 
could be achieved by computing T(A) in (52) instead. 

We also evaluate the performance on three test problems (numbered 4-6) with irre- 
ducible polynomials of higher degree. The results are given in Figure 5. In the fourth test, 
we use matrices _D„(A) similar to those in the first test, but with irreducible polynomials 
of degree 2 and 4. Specifically, we define 

Dn{X) = diag[l, . . . , 1,pi,piP2,pIp2,pIpI], (n = 4, 5, . . . ), 

where pi = + X + I, p2 = A'* + + + 1, ki„ = 2, K2n ~ 2, and d ~ 16. When 
the columns of the test matrices are permuted, our algorithm is faster than the other 
two methods whether or not U{X) is computed. When the columns are not permuted, 
computing U{X) causes our method to be slower than Villard's method. In this test, our 
algorithm would benefit from switching to the R/pR version of Algorithm 2 rather than 
the version over K described in the appendix. It would also benefit from computing T(A) 
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Fig. 5. Comparison of running times of the algorithms for three test problems in which the 
irreducible factors Pj(A) of the determinant are of degree greater than 1. (Top row) without 
column permutation of test matrices. (Bottom row) with column permutation. 

in (52) rather than the full inverse U{X) = E{X)^^. In the fifth test, we use 9x9 test 
matrices Dk{X) of the form 

Dkix) = diag[i, . . . , 1, n(A' + j), n(^' + J')'' n(^' + = 2, 3, . . . ), 

j=i i=i j=i 

with n = 9, Z = fc, Kjn = k and d = 2k^ + 4. Both the number of factors and maximal 
Jordan chain length increase with k. Our algorithm performs much better than the others 
when column permutations are performed on the test matrices. In the final test, we define 
n X n matrices 

n-2 

D^{\) ^ diag[l, 1, (A^ + 1), (A2 + \f{\^ + 2), . . . , n (A' + iT'^-'l (n = 3, 4, . . . ) 

so that all the parameters n, I = n — 2, Kjn = n — l—j and d = (n — 1) (n — 2) + 4 increase 
simultaneously. All three algorithms run very slowly on this last family of test problems. 



5. Discussion 

The key idea of our algorithm is that it is much less expensive to compute local Smith 
forms through a sequence of nuUspace calculations than it is to compute global Smith 
forms through a sequence of unimodular row and column operations. This is because (1) 
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Fig. 6. Running time of each step of our algorithm for the six test problems of Section 4. 

row reduction over R/pR in Algorithm 2 (or over K in Appendix A) is less expensive than 
computing Bezout coefficients over R; (2) the size of the rational numbers that occur in 
the algorithm remain smaller (as we only deal with the leading terms of A in an expansion 
in powers of p rather than with all of A); and (3) each column of V{X) in a local Smith 
form only has to be processed once for each power of p in the corresponding diagonal 
entry of D{X). Once the local Smith forms arc known, we combine them to form a (global) 
multiplier V^(A) for A{X). This last step docs involve triangularization of i?„(A) via the 
extended GCD algorithm, but this is less time consuming in most cases than performing 
unimodular row and column operations on A{X) to obtain D{X). This is because we only 
have to apply row operations to i?,i(A) (as the columns are already correctly ordered); 
we keep the degree of polynomials (and therefore the number of terms) in the algorithm 
small with the operation rcm(-, di); and the leading columns of B„{X) tend to be sparse 
(as they consist of a superposition of local Smith forms, whose initial columns X_i are a 
subset of the columns of the identity matrix). Sparsity is not used explicitly in our code, 
but it docs reduce the work required to compute the Bezout coefficients of a column. 

A detailed breakdown of the running time of each step of our algorithm is given 
in Figure 6. For each test in Section 4, we show only the case where columns of the 
test matrices arc permuted; the other case is similar. The step labeled "prime factors of 
dct(j4)" shows the time of computing the determinant and factoring it into prime factors. 
The step labeled "local Smith forms" could be made faster in tests 4-6 by working over 
R/pR (using Algorithm 2 rather than the variant in the appendix) as the irreducible 
factors Pj{X) have degree Sj > 1 in these tests. Also, although it is not implemented 
in this paper, this local Smith form construction would be easy to parallelize. The step 
labeled "matrix V" reports the time of computing V{X) from i3„(A). The cost of this 
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step is zero when there is only one irreducible factor in det[^(A)] as Bn{X) is already 
unimodular in that case. This happens when Z = 1 in the second test, in all cases in the 
third test, and when n = 3 in the last test. Finally, the step labeled "matrix i?" reports 
the time of computing £;(A) = A{X)V{X)D{\)-^ . 

The obvious drawback of our algorithm is that we have to compute a local Smith 
form for each irreducible factor of A (A) separately, while much of the work in deciding 
whether to accept a column in Algorithm 1 can be done for all the irreducible factors 
simultaneously by using extended GCDs. In our numerical experiments, it appears that 
in most cases, the benefit of computing local Smith forms outweighs the fact that there 
are several of them to compute. 



A. Alternative Version of Algorithm 2 

In this section we present an algebraic framework for local Smith forms of matrix 
polynomials that shows the connection between Algorithm 2 and the construction of 



canonical systems of Jordan chains presented in |25| . This leads to a variant of the 
algorithm in which row- reduction is done in the field K rather than in R/pR. 

Suppose i? is a principal ideal domain and p is a prime in R. M defined via M = i?" 
is a free i?-modulc with a free basis {(1, 0, . . . , 0), . . . , (0, . . . , 1)}. Suppose A : M M 
is a i?-modulc morphism. We define submodules 

Nk = {xe M : Ax is divisible by p''], {k > 0). (A.l) 

Then Nk is a free submodulc of M by the structure theorem Q for finitely generated 
modules over a principal ideal domain. (The structure theorem states that if AI is a free 
module over a principal ideal domain R, then every submodule of M is free.) The rank 
of Nk is also n, as p'^M C Nk C M. Note that Nq = M and 

Nk+i cNk, (fc > 0), (A.2) 
Nk+inpM^pNk, (fc>0). (A.3) 

Next we define the spaces Wk via 

Wk=Nk+i/pNk, (fc>-l), (A.4) 

where 7V_i := M so that W-i = M/pM. By (A.3), the action of R/pR on Wk is well- 
defined, i.e. Wk is a vector space over this field. Let us denote the canonical projection 
M M/pM by TT. Note that 7r(p7Vfc) = 0, so tt is well-defined from Wk to M/pM for 
A: > —1. It is also injective as xp G Nk+i =^ x e Nk, hy (A.3). Thus, cosets {xi, . . . ,Xm} 
are linearly independent in Wk iff {tt^xi), . . . , 7r(a;„i)} are linearly independent in M/pM . 
We define the integers 

rk = dimension of Wk over R/pR, {k > —1) (A. 5) 

and note that r_i = n and > iff there exists x € M such that p \ x and | Ax. 
We also observe that the truncation operator 

id:Wk+i^Wk:ix+pNk+i)^{x+pNk), {k > ^l) (A.6) 

is well-defined {pNk+i C pNk) and injective {x S Nk+2 and x S pNk a; g pNk+i, due 
to (A.3)). We may therefore consider Wk+i to be a subspace of Wk for k > —1, and have 
the inequalities 

Tk+i <rk, ik>-l). (A.7) 
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The case tq = is not interesting (as Nk = p^M for fc > 0), so we assume that ro > 0. 
Lemma 8 shows that when R = ^^[A]; which wc assume from now on, rg > is equivalent 
to the condition that det[A(A)] is divisible by p(A). We also assume that eventually 
decreases to zero, say 

rfe = <^ k > (3, (3 := maximal Jordan chain length. (A. 8) 

This follows from the assumption that det[74(A)] is not identically zero. It will be useful 
to define the index sets Ik = {i : n ~ r^-i + i i n — r^} for fc = 0, . . . , /3. 

Any matrix V = [xi, . . . , Xn] will yield a local Smith form AV ~ ED provided that 
Xi e Nk for i G Ik (0 < fc < /3) and the vectors 

{x, + pNk-i}.,ei, (A.9) 

form a basis for any complement Wk-i of Wk in W^-i- To see that p f det£^, we use 
induction on fc to show that the vectors 

{quo(Ax„p"o+M^}-rr' (A.io) 

are linearly independent in M/pM , where 

a» = fc, {ie Ik). (A.ll) 

Otherwise, a linear combination of the form * in Algorithm 1 would exist that belongs to 
Wk-i n Wk, a contradiction. The result that p| det£^ follows from Lemma 8. The while 
loop in Algorithm 1 is a systematic procedure for computing such a collection {a;i}ig/j., 
and has the added benefit of yielding a unimodular multiplier V. 

We now wish to find a convenient representation for these spaces suitable for compu- 
tation. Since p^^^M C pNk, we have the _R- module isomorphism 

Nk+i/pNk = (iVfc+i//+iAf)/(piVfc//+iM), (A.12) 

i.e. 

Wk^Wk/pWk-i, (fc>0), Wk-.^ Nk+i/p'^+'M, (fc>-l). (A.13) 

Although the quotient Wk/pWk-i is a vector space over R/pR, the spaces and 
M/p^~^^M are not. They are, however, modules over R/p^^^R and vector spaces over 
K. Note that A{X) induces a linear operator Afc on M/p^^^M with kernel 

Wfe = kcr Afc, (fc > -1). (A.14) 

We also define 

„ dimension of Wfc over ,, ^ , , , . , ,^ 

Rk = , (fc > -1, s = degp) (A.15) 

s 

so that = and 

i?fc = ro + --- + rfc, (fc>0), (A.16) 
where we used Wo = Wo together with (A.13) and the fact that as a vector space over 
K, dimWk = sr-fe. By (A.ll), Vk-i - rk = #{« : = fc}, so 

Rp-i = ro H h r^-i = (r_i - ro)0 + (ro - ri)l H h (r^_i - r^)/? 

= ai + • • • + q;„ = /i = algebraic multiplicity of p, 

where we used Theorem 7 in the last step. We also note that v := Ra = dimker(Ao) 
can be interpreted as the geometric multiplicity of p. 
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Equations (A. 13) and (A. 14) reduce the problem of computing Jordan chains to that of 
finding kernels of the linear operators over K. If we represent elements x £ M/p'^'^^M 
as lists of coefficients x^-''''™-' G K such that the components of x involve the terms 

a,0,;,myA'", 0<j<fc, l<l<n, 0<m<s-l, (A.18) 

then multiplication by A in M /p^^^M becomes the following linear operator on x^"^'^"*'^^ : 

h®S ^ 



I®ZI®S 

■•. ■■• 





S as in (10), 



Z 



A 
'•■ 



(A.19) 



I®ZI®Sj 



Here is a (fc + 1) x (fc + 1) block matrix, I ® S is a Kronecker product of matrices, 
5 and Z are s x s matrices, and / is n x n. Multiplication by A™ is represented by S™, 
which has a similar block- Toeplitz structure to Sfc for 2 < to < s — 1, but with 5* replaced 
by S*™ and Z replaced by 



Zm. — 







ET=,'s'zs' 



m = 

1 < TO < S — 1. 



(A.20) 



The matrix p(§fc)-' is a shift operator with identity blocks In® Is on the jth sub-diagonal. 
If we expand 

A{\) = A^^^ + + • ■ • + p'A^'?^ = + • • • + \'-^ A'^i''-^\ (A.21) 

the matrix Afc representing A{\) is given by 



Ao 
Ai Ao 



where 



Z^J71=0 ^ 



Ak Ak-i 





An 



(A.22) 



(A.23) 



J =0, 

^Erio [^^''"^ ^™ + ^(^"-1^") ® z„] , J > 1. 

This formula may be derived by observing that the matrix representation of the action of 
p(A)-' A'"A(-''^") on M/p^+'^M is block Toeplitz with A^^'™' ® S*™ on the jth sub-diagonal 
and a'-''™) ® Zra on the (j -I- l)st. Defining Aj this way avoids the need to compute 
remainders and quotients in subsequent steps (such as occur in Algorithm 2). 

Next we seek an efficient method of computing a basis matrix for the nullspace 
Wfc = ker Afc. Suppose fc > 1 and we have computed '^k-i - The first k blocks of equations 
in AfeXfc = imply there are matrices and Y^. such that = [Xj,„iUfe; Y^,], while 
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the last block of equations is 




The matrices Xfc can be built up recursively by setting Xq = Xo = Yo = null(j4o), 
defining Uq to be an empty matrix (with zero rows and sRq columns), and computing 



[Yfe;Ufc] = 



Ak = [Ak-i,[Ak,-.-.Ai]Xk-ij, (fc>l) 
[Yfc; Uk] — new columns of null(^fe) beyond those of null(^fe_i), 
Yk-i Yk\ Xk ^ [Xk-iUk;Yk], 

JUfc_i;0] Uk) ' Xfc = [i(Xfc-i),Xfe]. 

Here l : -> if^^C+i) represents muhiplication by p from M/p'-M to M/p'^^'^M: 

= [0;x(°);...;x('-i)], x^^'^ , G A' (A.25) 

By construction, Xfe = [i(Xfc_i),Xfc] is a basis for when k > 1; it follows that 
Xk + i(Wfc_i) is a basis for M^fc when Wk is viewed as a vector space over K. We define 
Xq = Xq and X-i = Isnxsn to obtain bases for Wq and W^_i as well. 

But we actually want a basis for Wk viewed as a vector space over R/pR rather than 
K. Fortunately, all the matrices in this construction are manipulated s x s blocks, and 
the desired basis over R/pR may be obtained by selecting the first column from each 
supercolumn (group of s columns) of Xk- Indeed, if [xi, . . . , Xs] is a supercolumn of Xk, 
we are able to prove that Xj — SkXj^i G t(Wfc_i) for 2 < j < s. Since §fc represents 
multiplication by A, these columns are all equivalent over R/pR. We are also able to 
prove that constructing Xk in this way (using the first column of each supercolumn) is 
equivalent to Algorithm 2, i.e. it yields the same unimodular matrix V{X) that puts A{X) 
in local Smith form. We omit the proof as it is long and technical, involving a careful 
comparison of nuUspace calculations via row-reduction in the two algorithms. 

In practice, this version of the algorithm (over K) is easier to implement, but the 
other version (over R/pR) should be about s times faster as the cost of multiplying two 
elements of R/pR is 0{s^) while the cost of multiplying two s x s matrices is O(s^). The 
results in Section 4 were computed as described in this appendix (over A' = Q). 
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